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Abstract 



In spite of the huge literature on deconvolution problems, very little 
is done for hybrid contexts where signals are quantized. In this paper we 
undertake an information theoretic approach to the deconvolution prob- 
lem of a simple integrator with quantized binary input and sampled noisy 
output. We recast it into a decoding problem and we propose and analyze 
(theoretically and numerically) some low complexity on-line algorithms to 
achieve deconvolution. 

Keywords: Hybrid deconvolution systems, Input estimation, Bit-MAP 
decoding. 

1 Introduction 

The deconvolution problem is ubiquitous in many scientific and technological 
areas such as seismology, astrophysics, image processing and medical applica- 
tions (see e.g. [21 G2 HI [101 US US]). Its most general formulation is as follows. 
We consider a time horizon T (possibly infinite), a convolution kernel fC(t) and 
the input /output system 



(we implicitly assume that JC and u are s.t. the above integral makes sense). 
The problem is to estimate the input u from some noisy version y of the output 
x. 

This is an instance of inverse problem: to see why the problem is difficult we 
focus on the special case JC — 1 which will be the case considered throughout 
this paper. In this context, (nl) can be written as 



Since the operation of differentiation is not robust with respect to noise pertur- 
bation, the reconstruction of u from y cannot be simply done by differentiation. 
The goal is then to estimate u, using the available information on x and any 
a priori information on u. Several procedures can be exploited to accomplish 
this task and the choice is in general motivated by a suitable trade-off between 
precision of the solution and complexity of the algorithm. 




(1) 



x{t) 



u(t), x{0)=0. 
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Classical algorithms due to Tikhonov [3TJ [52] are based on a penalization 
technique and work off-line: the estimation u at any time depends on the whole 
signal y(t) with t £ [0, T]. This is a significant drawback in on-line or interac- 
tive data flows application where the delay in estimation is required to remain 
bounded. Causal algorithms have been studied in [TJ [H] , where bounds on the 
error have been obtained for the case of bounded noises and regularity assump- 
tions on the input signals u. 

An outstanding problem is how to use possible side information available 
on the input signal u(t) on the above algorithms: indeed, while functional, and 
more generally convex, constraints can be incorporated in the above algorithms, 
things arc quite less clear for more general constraints. In this paper we focus on 
the case when u is known to be a piecewise constant signal with values restricted 
to a fixed known finite discrete alphabet. This turns out to be a significant issue 
in the context of hybrid systems where continuous-time systems are driven by 
discrete digital signals. Such constraints are clearly of a non-convex type and is 
not obvious how to include them in classical deconvolutional algorithms. 

In this work we will undertake an information-theoretic approach to causal 
deconvolution problems with sampled quantized inputs introducing algorithms 
which reconstruct u through a decoding procedure. A key feature of these al- 
gorithms is that they present very low complexity structure, while they exhibit 
performance quite close to the information theoretical limit. The main math- 
ematical results consist in a rigorous analysis of the asymptotic performance 
of the proposed algorithms employing tools from the ergodic theory of Markov 
Processes. 

In Section 2 we will give all the mathematical details regarding the decon- 
volution problem with quantized input signals. In particular, we will link it to 
classical decoding problems and we will study the possibility to use classical 
decoding techniques for our purpose. In Section 3 we will develop a couple of 
low complexity deconvolution algorithms comparing their performance. Section 
4 is the core of our paper: it is devoted to a deep analysis of the proposed 
algorithms. Using Markov Processes ergodic theorems we will be able to give 
theoretical results on their behavior in the asymptotic regime (time range going 
to oo). 

We conclude now the introduction with notation and terminology to be used 
throughout the paper. 



1.1 Notation 

Given a subset A of a set SI, 1^ : O — ^ {0, 1} is the indicator function, defined 
by 1a(x) = 1 if x € A and 1a( x ) = otherwise. Erfc indicates the complemen- 
tary error function, defined by erfc(a;) = ^= e~ s ds for any x £ R. B(fl) 
indicates the Borel a-algebra of f2. 

Capital letters will be used to name random variables (r.v.'s for short), while 
boldface capital letters will be vectors whose components are r.v.'s. 

P will be the probability on discrete r.v.'s, while /(.) the probability density 
function of continuous or hybrid (that is, involving both continuous and discrete 
events) r.v.'s. Instead, P will denote the transition probability matrix of a 
Markov Chain (Section |4.1.1 ) and P(-, •) the transition probability kernel of a 



Markov Process (Section 4.2.1). Finally, E will be the mean operator. 
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2 Statement of the problem 



2.1 The deconvolution problem 

In the following we stick to the system ([I]) under the assumptions we make 
throughout this paragraph. 

Assumption 1 The available output signal is a noisy, sampled version of x(t): 

Vk = x k + n k 

where x k = x(rk), r > being the constant sampling time, and nk 's are realiza- 
tions of independent, identically distributed Gaussian variables Nk 's of mean 
and variance a 2 . 

We will denote by y = (yi, . . . , yx) & the vector of all available measures 
(K = T/t is assumed to be an integer) and by y h a — (y a , y a +i, ■ ■ ■ , Vb) the 
available measures from time a to time b, with a,b € {1, . . . , K}, a < b. 
A deconvolution algorithm consists in a function 

T:R K 

u = r(y) is the estimated input and in general it will not coincide with the true 
input u. What in general we request is a bound on the error u — u and some 
consistency property: when the variance of the noise and the sampling time go 
to 0, the error should converge (in some suitable sense) to 0. 

We say that a deconvolution algorithm T is causal (with delay koT. fco G IN) 
if there exists a sequence of functions : R k+k ° — > ft,[( fc ~ 1 ) r > fcT [ j where k = 
1,2,..., such that 

r (y)lte[(fe-i)r,fer[ = r fc (yi +feo ) . 

Such an algorithm estimates the unknown signal in the current time interval 
[(k — 1)t, kr[ exploiting the past and present information yi, . . . , y% along with 
a possible bounded future information yu+i, • ■ • , yk+k - 
We now come to the assumptions on the input signals. 

Assumption 2 There is a finite alphabet U C R and we consider signals of 
type 

K-l 

u(t) = ^2 u k^[kr,(k+i)r[{t) u k eU . (3) 

fc=0 

u(t), with t £ [0,T[, is then completely determined by the sequence of samples 
Mo, Mi, ... , uk-i- For simplicity we assume the sampling time r to be the same 
as in the output and to have an exact synchronization in the sampling instants. 
The output signals are now identified by samples x\, Xi, ■ ■ ■ , xk € X, where 
X C H is a suitable alphabet (recall that we have fixed Xq — 0). Of course, 
in principle, one could still use the deconvolution algorithms in pj |5] or [STJ 
22 , however, there would be no way to use inside the algorithm the a priori 
information on the quantization of u. Instead we now show that, in this case, 
our deconvolution problem can completely be recasted into a discrete decoding 
problem. Notice indeed that the input/output system is simply described by 

x ° = (4) 
Xfc+i = x k + TMfc, k = 0, ...,K- 1. 
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The vector x = (x\, . . .xk) can thus be seen as a coded version of u = 
[uq, . . . , Uk—i)' we can write x = £(u) where £ denotes the encoder given 
by Q. Afterwards, x is transformed as it was transmitted through a classical 
Additive White Gaussian Noise (AWGN) channel, the received output being 
given by y k =x k +n k . 

It is on the basis of these measures that we have to estimate the 'information 
signal' u. Notice that the real time t is completely out of the problem at this 
point and everything can be considered at the discrete sampling clock time. In 
the coding theory language, a decoder is exactly a function V : M, K — s- U K 
which allows to construct an estimation of the input signal: u = T>(y). Even 
in this context we can talk about causal algorithm if there exists a sequence of 
functions V k : R k+k ° -> U such that 

2>(y) fc -i =V k {y k+ka ) k = l,...K. 

Finally, 

Assumption 3 The unknown input is assumed to be generated by a stochastic 
source with a known distribution, independent from the noise source. 

The particular source distribution considered in this work will be introduced in 
Section [231 

According to the notation given in Section in the sequel U k will identify 
the input r.v. at time k, X k the corresponding system output given by expres- 
sion |4j Yk = X k + Nf. the measured output, N k being the Gaussian noise. Fur- 
thermore, U k = V(Y) k and X k = X k -\ + rU k -i (X = 0) will be respectively 
the estimated input and the estimated state. Finally, U = (Uq, . . . , Uk-i), 
U= (Uo, ... , Uk-i), Y = (Yi, . . . , Yk), Y b a = (Y a ,...,Y b ), a,b e {!,... ,K}, 
a < b. 

2.2 Error Evaluation: The Mean Square Cost 

A fundamental issue in the deconvolution problem is the choice of the norm with 
respect to which errors are evaluated. In this context, we consider the mean 
square cost: 

A'-l 

d{V) = tE (||U - U|| 2 ) = r E (\Uk - Uk\ 2 ) ■ 

We now define T>* as the decoder minimizing dp) among all the possible de- 
coders. It can be constructed as follows: given the density /y(y) of Y, notice 
that 

A'-l 

dp) = TY] / E(|[/ fe -2?(y) fe | 2 |Y = y)/ Y (y)dy. 
k~o J ^ K 

Hence, for any y g R K , 

V*(y) k = argminE(|[/ fc -?f |Y = y) = argmin ^ \u-v\ 2 P(U k =u\Y = y). 
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This turns out to be a finite optimization problem which can be solved by means 
of a marginalization procedure and a Bayesian inversion: 

P (! 4 = u |Y = y) = £ Wyl«u))P(u = u) 
«/k .my] 

u£W ft \u k —u 

Analogously, we can define 2?* fc o as the decoder minimizing d(T>) among all the 
possible causal decoders with delay fep: 



(y) fe _i = V* k k ° (y^ +fco ) = argmin - v\ 2 P(U k ^ = U |Y^ +fc ° = y ^ +fe °). 

(5) 



2.3 The BCJR algorithm 

In practice, the decoder T>* can be implemented with the well-known BCJR 
algorithm [T] . This algorithm computes the probabilities of states and transi- 
tions of a Markov source, given the observed channel outputs; in other words, it 
provides the so-called APP (a posteriori probabilities) on states and transitions, 
therefore on coded and information symbols. 

Let us briefly remind the BCJR procedure. For i,j £ X, we define the 
following probability density functions: 

<*k(i) = /(x fc ,Yf)(*.yi) k = l,...,K 

&» = /(Yf +1 |x fc) (yf + iN) k = o,...,K-i (6) 

Fk(iJ) = f(x k ,Y k \x k ^){hVk\i) k = l,...,K. 
For any k = 1, . . . , K, the APP on states and on transitions respectively are: 

AfcOO = f(x k ,Y)(i,y) (*k(i,j) = f(x h ,x k - u -Y)U,i,y)- 

Given the following initial and final conditions: 

oo(t) =P(A =i) -- 
fiici}) = 1 for an y i £ X 
for k — 1 , . . . , K we have 

Afc(i) = a k (i)/3 k (i) 

<?k{i,j) = a k -i{i)T k (i,j)p k {j) (7) 

where a k (i) and /3 k {i), i £ X, can be respectively computed with a forward and 
a backward recursions: 

a k(i) = X! ®k~i(h)T k (h,i) (3 k (i) = ^ F k +i(i, h)(3 k+1 (h). (8) 
hex hex 

The APP are then recursively computed and finally used to decide on the trans- 
mitted input sequence. 



1 if i = 
otherwise. 
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Analogous causal versions of the BCJR algorithm can be used to implement 
the decoder ^ with delay ko- For k = 1, . . . , K — fco, the APP on the transitions 
becomes 

Zk(i,j) = /(x k) x*_i,Yf +fc '')0'j* 5 yi +feo ) = ak-i(i)Tk{i,j)Pk{j) (9) 

where a& and are defined as above, while (3k{j) = /, v Ht , y \(y*ii°b')- F° r 
A; > K — ko, we recast into the classical formulation Q. For brevity, we will 
refer to the causal BCJR as CBCJR. 

2.4 Further Assumptions 

In the sequel of this work, we will make two further assumptions on the input: 
Assumption 4 The input alphabet is binary: U = {0, 1}. 

Assumption 5 For k = ... K — 1, the Uk 's are independent and uniformly 
distributed: P(Uk — 0) = P(Uk = 1) = \- in particular, the Uk's are indepen- 
dent from the Gaussian noises Nk 's. 



Now the probabilistic setting introduced at the end of Section 2.1 is complete 



(10) 



and we can resume the system as follows: given Xo = Xq = 0, for k = 1, . . . , K , 

U k -i ~ Bernoulli (1/2) ; 
Afe = X k -i + rUk-i; 
A fe ~AA(0,CT 2 ); 
Y k = Xk + N k ; 

U k -x =2?(Y) fc _i; 
Afe = X k -i + rUk-i- 

Notice that also A^'s are independent from JVfc's. 
Under Assumption |4j 

K-l 

3(23) = r ^ E (|C/ fc _ = rAP b (e) 

fc=0 

where 

p *( e ) = j{ E ^ ^) = ^ E (l u - ui) (ii) 

fc=0 

is the so-called Bit Error Rate (also denoted by BER), a very common perfor- 
mance measure in digital transmissions that expresses the average number of 
bits in error. In our context, minimizing d(T>) is equivalent to minimizing the 
BER and, therefore, the optimal decoder 23* that performs this minimization 
coincides with the well-known Bit-MAP (Maximum a posteriori) decoder (see 

nam): 

23*(y) fe = argmaxP(C/ fe =u|Y = y). (12) 
«e{o,i} 
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At step k: 


Computations 


Storage Locations 


Decoding Delay 


BCJR 


0(k) 


O(k) 


K-k 


CBCJR 


0{k) 


0(k) 


k = 



Table 1: 



Its causal version is given by 

V*»°{y) k = argmax P(U k = u\Y>{ +1+ko = y* +1+fc °). 
«e{o,i} 

We introduce here also the Conditional Bit Error Rate, CBER for short: 

1 K ~ 1 ~ 1 
F »(e\V) = -£ * U k \V) = -E(|U - U| |U). 



k=0 



(13) 



(14) 



While the BER is a parameter that evaluates the mean performance of the 
transmission model, the CBER describes its behavior for each possible sent se- 
quence. The CBER is then a relevant parameter for our system, whose decoding 
performance changes in function of the transmitted input. 
For computational simplicity, from now onwards let 

r = 1 (15) 

so that X = {0, ...,K} and in particular, if X = 0, X k £ {0, ...,fc}. In 



the BCJR implementation of decoders (12) and |13| ), we obtain that a k {i), 
i = 0, 1, . . . , K, is null for any i > k, while matrices T k and a k are non-null only 
on diagonal and superdiagonal. By Assumption [5j P(X k = j\X k -i = i) = 1/2 
if j = i,i + 1 and otherwise. Recalling that the transition between X k and 

Yfc is modeled by an AWGN channel, f(Y k \x k )(yk\j) 
obtain 



<7 V 27T 



exp 



(-^). 



wo 



r*(*,j) - f (YklXk )(y k \j)P(Xk = - *) 
i / (yfe-i) 2 



2CTV27T 



exp 



2a 2 



for j — i,i + 1. 



(16) 



Given T^, erfc or its causal version a k can be recursively computed and the 
corresponding decoding rules are: 



BCJR 



CBCJR 



V*c y ) 1 = J° if Ei=o <T fc( i ' i + !) ^ ELoMM) (17) 
I 1 otherwise. 



X>* fe °M*._ . = I if E*=o CT fe (i,i + l) < £ l=0 f 18 ) 
1 otherwise. v 7 



3 Suboptimal Causal Decoding Algorithms 

Causality has a price and the CBCJR algorithm has clearly a worse performance 
than BCJR. 
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Figure 1: BCJR vs CBCJR. 



By simulating our system, we quantify the performance gap between BCJR 
and CBCJR (ko = 0) as we can appreciate in Figure]!] the two curves represent 
the corresponding BER's in function of the Signal-to-Noise Ratio (SNR), here 
defined as t 2 ja 2 — 1/a 2 . These outcomes are the averages over 5000 transmis- 
sions, each of which being a 100 bit message, avoid unacceptable delays and 
complexity problems in the BCJR and CBCJR implementation). We remark 
that CBCJR has the best performance among causal deconvolution algorithms. 

Moreover, by comparing the efficiency of the two procedures (the results 
are reported in Table [lj, we gather that for both BCJR and CBCJR the re- 
quired computations and storage locations linearly increase with the number of 
transmitted bits, which is a drawback in case of long transmission. 

This fact motivates the development of new suboptimal causal algorithms 
that improve the efficiency without substantial loss of reliability. To achieve 
that, we implement the CBCJR fixing the number of states, that is, at each 
step we save the n states with largest probability (where n is arbitrarily chosen) 
and we discard the others. 

We now introduce the algorithms in the cases n = 1 and n = 2, which are of 
great interest for their low complexity, and we show some simulations' outcomes. 



3.1 One State Algorithm 

A suboptimal causal decoder X>W : H K — > {0, 1} K can be derived from the 
CBCJR by assuming the most probable state to be the correct one. At any step 
k = 0,1,..., X>W decides on the current bit by a single MAP procedure and 
upgrades the estimated state, which is the only one value that requires to be 
stored. 

Consider ([6]), ^ and (18). Given the estimated state Xk-i, the decoding 



rule of 1>W at time step fc is given by (18 1 with no backward recursion f3k{j) 



and ak-i{xk-i) — 1, afc-i(j) = for any j ^ Xk-i- This reduces the decoding 
task to the comparison between two distances; in fact, the One State algorithm 
that implements 2?W is as follows: 

1. Initialization: xq = 0; 
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Figure 2: Trellis representation of the Two States Algorithm. 



2. For k = 1, . . . , K, given the received symbol y k € R, 

u k -i = £> (1) (y)fc-i = argmax P(U k -i = u\Y k = y k ,X k _i = Xk-i) 

U6{0,1} 

HTh(x k -i,Xk-i)>T k (x k -i,x k -i + l) (19) 

1 otherwise 

X k = X k -i + Mfc_i 



and given the equality ( 16 ) in the AWGN case, 
T k (x k _x>Xk-i) > rjfe(acfc_i,Xfc_i + l) |j/A-x fc _i| < |j/jfe-(x fc _i + l)|. (20) 

3.2 Two States Algorithm 

By fixing n = 2, we derive a decoder 2?( 2 ) : R K — > {0, 1} K that, at each 
step, estimates the current input bit and computes and stores the two most 
likely states along with the corresponding probabilities a k (i) (defined by (f6j|). 
As for the One State Algorithm, the estimation of the input bit is performed 
by a MAP decoding rule ( 18 ) with no backward recursion and summing over 
the two "surviving" states. In detail, the recursive Two States algorithm that 
implements T>^ is the following: 

1. For k = 1, given the unique starting state Xg = 0, we estimate the first 
bit by a One State procedure: 

u Q = 2? (2) (y) = argmax P(U = u|Yi =y u X =0) 
«e{o,i} 

if |yi|<|y & -i| (21) 

1 otherwise. 

Afterwards, the possible states are two: Xi(Q) = and Xi(l) = 1 and the 
corresponding probabilities ai(0) and ai(l) in our framework are given 

by 

= f(x u Y{){j,V\) = f(Y 1 \x 1 )(yi\j)P(Xi =j) 

= f{Y 1 \x 1 ){yi\i)^{Uo = j) = ^f(Y 1 \x 1 ){y\\j), i e {o, 1}. 
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We then normalize these probabilities so that ai(0) + ai(l) — 1 and we 
just store the couple of values (ai(0),xi(0)), as this is sufficient to retrieve 
also (cti(l), xi(l)) = (1 — cci(O), a?i(0) + 1). For notational simplicity we 
rename the stored vector (ai(0), a?i(0)) as (a±,xi). 

2. For k = 2,3,. ..,K, given (a k -i,x k -i) and F k = f{x k ,Y^)^k,Vi) 
M fe _i =2? (2) (y)fe-i = 

= argmax P(t4_i = u\Y k = y k ,X k - X = xVi,-Ffe-i = a k -i) = 

mG{0,1} 

if a fc _ir fe (x fc _i, xu-i) + (1 - afc-i)rfc(xfc_i + 1, Xft_i + 1) > 
> afc_ir fc (x fe _i,Sfe_i + 1) + (1 - a k -i)T k (x k -i + l,x k -i + 2) 

1 otherwise. 

From step k — 1, three possible states arise: £fc_i, Xfe_i + 1 and x k -i + 2, 
whose probabilities are given by the forward recursion in 

afc(xfc-i) = a k -iT k (x k -i,x k -i) 

a k (x k -! + 1) = ak-iTk(xk-i,x k -i + 1) + (1 - a k -i)T k (x k -i + l,x k -i + 1) 
a k (x k -i + 2) = (1 - a k ^i)T k (x k -i + l,x fe _i + 2). 

(22) 



which can be reduced as follows in the case (16 1: 

,~ N 1 ^ (2/fe-Xfe-i) 2 

a k (x k -i) = Q!fe-i- — 7f= exp 



2(7^ V 2cr 2 



,~ l f (y* - (xk-i + i)) 2 

afe^x/s-i + 1) = - — 7= exp 1 



2^ 1 V 2a2 

, s n x 1 f (y fe -(x fc _ 1 +2)) 5 

a k (x k - 1 + 2) = (1 - a k -i)- — 7= exp 1 



Since - (x fe _i + 1)| ^ max{|y fe - (x fc _i +j)\,j = 0, 1, 2}, in the AWGN 
case a k (x k ~i + 1) ^ xmn{a k (x k -i + j),j = 0,1,2}. Hence, the state 
x k —i + 1 is never discarded and also the two "surviving" states are always 
adjacent. Therefore, 

• we calculate a min = mm{a k (x k -i),a k (x k -i + 2)}. 

• If a m in = a k (x~k-i), the surviving states are (x k -i + l,Xfe_i + 2) 
with probabilities (a k (x k -i + l),a k (x k -i + 2)). We then store the 
lowest state along with the corresponding normalized probability: 

• Similarly, if a min = a k {x k -i+2), (a k ,x k ) = ( afc ( £fc _")+ a fc ~ ( Sfc-i+i) ' %k-i)- 

Remark 1 When the extreme case a k = 1 occurs, x k + 1 has null probability, 
then a?fe+i = x k ; analogously, when a k = 0, x k +\ = x k + 1. in t/iese cases ffte 
Two States Algorithm actually behaves as the One State Algorithm. 

Remark 2 As a consequence of Remark [Ij the unique initial state xq = 
can be interpreted as a double state with all the probability in xq — 0, that is, 
(a ,x ) = (1,0). 
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At step k: 


Computations 


Storage Locations 


Decoding Delay 


BCJR 


0(k) 


0(k) 


K-k 


CBCJR 


0(k) 


0(k) 





ONE STATE 


0(1) 


1 





TWO STATES 


0(1) 


2 






Table 2: 

3.3 Simulations and comparisons 
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Figure 3: Performance comparison of different causal decoders. 

We report now the simulations' outcomes concerning the decoders V*° , T>^ 
and T>( 2 \ respectively implemented with CBCJR, One State and Two States 
algorithms. The simulations have been performed considering 5000 different 
transmissions, each of which being a 100 bit message. The obtained results are 
then the averages overall transmissions. 

In Figure[3]we compare the efficiency of the three decoding schemes, in terms 
of BER: we evidence that two states are sufficient to achieve performance very 
close to the causal optimum: we observe that the gain between T>^> and 2?*° 
never exceeds 0.15 dB, while it achieves 0.8 dB between T>^> and 2?*° for BER's 
values between 0.2 and 0.3. Moreover, as we report in Table[2| the complexity of 
One State and Two States algorithms is constant when the number is constant 
and no delay is produced in the decoding: this makes them efficient even for 
long-time transmissions, i.e., for a large number of states. 



4 Suboptimal Causal Decoding Algorithms: The- 
oretic Analysis 

In this section, we propose an exhaustive theoretic analysis of One State and 
Two States algorithms and we provide a formal setting for the analytical com- 



putation of their performance. According to Definitions 11 and|14|in Section 1, 
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we will compute both the BER and the CBER, which respectively describe the 
decoding for the "mean input" and for each possible input. 

The natural setting of this analysis is the theory of Markov Processes, in 
countably infinite or not countable spaces (we will talk about Markov Chains 
when the space is countably infinite). 



4.1 Theoretic Analysis of the One State Algorithm 

Suppose to transmit K (possibly infinite) bits and to decode by the One State 
method. The starting point of our analysis is the definition, at any step k = 
1, 2, 3 . . . , of the r.v. 

D k = X k -X k eZ (23) 



X k being defined (10). D k actually represents the difference between the actual 
and the estimated state values. Since Dq = 0, the following recursive relation- 
ship holds: 

D k+1 = D k + U k -U k (24) 



where U k -i — V 1 - 1 ) (y) k _i (sec the algorithm (19). While U^s are independent, 
U k is function of U k and D k . Then, the stochastic process (D k ) ke ^ is a Markov 
Chain (whose definition is formally given in the next section), which can be 
exploited to carry on our analysis; in order to do that, let us first review some 
basic elements of Markov theory. 



4.1.1 Markov Chains 

The definitions and results introduced in this Section can be retrieved in the 
Chapter 3 of [2D] or in the Chapter 3 of [TT] . 

By Markov Chain we intend any sequence of random variables (^n) n =o,i,... 
assuming values in a countable set X and satisfying the Markov property: 
P(X n+1 = y\X n = x,X n _i, . . .,X ) = P(X n+1 = y\X n = x). If the chain is 
time-homogeneous, that is P(X n+ i = y\X n = x) = P(X n+m+ i = y\X n+m = x), 
the transition probabilities P x ,y — P(-X*n+i = y\X n = x) are the entries of the 
stochastic transition probability matrix P G [0, l] XxX . 

We review some important properties of a Markov Chain {X n )n=o,i,... on 
X = Z: 

Definition 1 \2(A Section 3.1] Two states x,y € Z communicate if there exist 
rt, m G IN s.t. (P n ) x . y > and (T 3m ) y , x > 0. If all the states communicate, the 
Markov Chain is said to be irreducible. 

Definition 2 f2Si Section 3.2.3] Let Tj = min{n > : X n = j}: a state j is 
said to be positive recurrent if¥,(Tj\X = j) < oo. The Markov Chain itself is 
said to be positive recurrent if all its states are so. 

Proposition 3 \2(A Last part of Section 3.2.3] If a Markov Chain is irreducible 
and has one positive recurrent state, then all the states are so, that is the chain 
is positive recurrent. 

Definition 4 \2(A Section 3.2.3] A invariant (or stationary) probability vector 
is a probability vector $ (that is, $ € [0, l] x and ^2 xeX ^ x ~ ^ ) suc ^ that 
<i>' p <|> ' . 
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The existence of an invariant probability vector, assured under some conditions, 
gives an important convergence result, as stated in the following 

Proposition 5 \2IA Sections 3.2.3-3.2.4] An irreducible, positive recurrent Markov 
Chain admits a unique invariant probability vector $. Moreover, $ is the limit 
of the so-called Cesaro sum, that is 



1 K ~ 1 



fe=0 

4.1.2 The mean BER 

Let us go back to the One State algorithm. According to [24j (Dk)keK is a 
countable homogeneous Markov Chain on Z, with transition probabilities 

?*,v = P(Dk+i = y\D k =x) = ipx,»(0) +P e ,»(l)] 

where P x _ y (u) = P(Dk+i = y\Dk = x, Uu = u), u e {0, 1}. Notice that the only 
non-null entries of P(u) are the following: 

Pd,d+i(0) = \ erfc Pd,i(0) = 1 - Pd,d+i(0) 

Pd,d(l) = I erfc ( d ^A P rf ,d-i(l) = 1 - Pd,d(l) 



2 V V2a 

P is tridiagonal and, for any x, y G Z, P x>3/ = P-a;.-j, and P X) j/ > if and only 
if |x — y\ < 1; by iteration, for any n 6 IN, (P > if and only if \x — y\ < n. 
Hence, given any couple of states x,y £ Z with distance \x—y\ = m, (P ) x ,y > 
and (P ) Xt y > 0, that is, (Dk)keN is irreducible. Moreover, 

Lemma 6 (Dk)kevi * s positive recurrent. 

Proof It suffices to apply the following criterion proposed in [30]: if there 
exists a function g £ R +z so that g x > (Pg)-j + e for any x € Z \ {y} and for 
some e > 0, then y is a positive recurrent state. 

In our case, it is easy to prove that y = is a positive recurrent state 
considering g x = \x\. Moreover, given that the chain is irreducible, if one state 
is positive recurrent, all states are so. I 

Proposition 7 The following statements hold: 

I- (-Dfe)fcGH admits a unique invariant probability vector $; 
2. $ is defined by 

\d\ p 

^ = ^115^ ( 25 ) 

i— 1 



where <I>o 



1 + 2 Sd=l n'fl Pi-l,i/Pi,i-l 
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Proof (1) It follows from Proposition [5j 

(2) By ($ T P) d = for any del,, it follows that 

$ d _iP d _ M - $ d P d , d _i = c (c constant). (26) 

In particular, as $d = Q-d for any d €E Z (this is due to the uniqueness of the 
invariant measure and to the symmetry of P), it suffices to substitute values 



d = and d = 1 in (261 to conclude that c = 0; hence, relation (251 holds 



Notice that c = corresponds to the property of time-reversibility of a Markov 
Chain (see Section 4.8 of [16]), hence one could even prove it by Theorem 4.2 
in [TB] , after having introduced the concepts of aperiodicity and ergodicity of a 
Markov Chain. 

From Proposition [7[we deduce in particular that for any d E Z, &d > 0. 
Moreover, since Pf—i^/P* j_i < 1 for i > 1, $d has a maximum at d — and it 
is monotone decreasing for d > 0. 

As a consequence of Proposition [5j 

Corollary 8 Lei q d = P[U k ^ U k \D k = d] = P d ,d+i + Pd,d-i, then 



lim P b (e) = ^q d $ d - 

Proof Since 



k=0 deZ k=0 



the result follows from Proposition [5] and by the Lebesgue's Dominated Conver- 
gence Theorem. Indeed, 




k=0 d6Z 

where j ? J2k=o(P )o,d < 1 - 

■ This concludes the computation of the BER in case of long-time 

transmission, given the distribution of the input source. In the next paragraph 
we study how the performance depends on the transmitted input sequence. 

4.1.3 The Conditional BER 

In the asymptotic case, the CBER converges to the same limit of the BER for 
almost all the possible inputs: 

Theorem 9 Let tt be the uniform Bernoulli probability measure over {0,1}^. 
Then, for the One State algorithm, 

lim Pf,(e|U) = lim P&(e) for ix-a.e. U. 

Theorem [9] gives a stronger result than Corollary [8] the mean behavior of the 
One State algorithm is stated to be the behavior for each possible input occur- 
rence, except for a 7r-negligible set. To prove Theorem [9j we will refer to the 
theory of Markov Chains in Random Environments (see Sections 5.1 and 5.2 in 
the Appendix). 
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4.2 Theoretic Analysis of the Two States Algorithm 

Similar to the One State algorithm, the Two States procedure can be studied 
through the Markov Theory, which provides the instruments to compute both 
BER and CBER. As shown in Section |3.2| the Two States procedure stores, 
at each step, a state and its normalized probability, this information being 
sufficient to individuate also the second state and probability. Let Xk be the r.v. 
representing the stored state, Xk the current correct state, Dk = Xk — Xk and 
Ak the r.v corresponding to the probability of Xk'. now, the stochastic process 
(Ak, Dk)keri in [0, 1] x Z is a Markov Process, whose definition (which actually 
extends the definition of Markov Chain from a denunerable to a continuous set) 
is now given. 

4.2.1 Markov Processes 

The definitions and results introduced in this Section can be retrieved in |12j or 
in the Chapter 2 of [11] Consider a set X endowed with a countably generated 
a- field F. A transition probability kernel (or Markov probability kernel, see, e.g., 
[T2l Section 3.4.1]) on (X, J 7 ) is an application P : X x F — > [0, 1] such that 

(i) for each F E F, P(-, F) is a non-negative measurable function; 

(ii) for each x G X, P(x, •) is a probability measure (p.m. for short) on (X, J 7 ). 
Given a bounded measurable function v on (X.,F), we denote by Pv the 

bounded measurable function on (X, F) defined as 

{Pv)(x)= [ v(y)P(x,dy). (27) 

Further, let fi be a measure on (X, J 7 ): we define the measure [iP 

{pP){F)= f P(x,F)fj,(dx) FeF. (28) 

We define the n-th power of the transition kernel P simply putting P 1 (x,F) — 
P{x,F) and P n (x,F) = J^P(x,dy)P n - 1 (y 1 F). It is easy to see that P n (x,F) 
are transition kernels, too. Corresponding actions on bounded functions and on 
measures will be respectively denoted by P n v and \xP n . 

Definition 10 [12, (10.1)] A measure ip on (X, J 7 ) is said to be invariant for 
the transition kernel P if ipP = -ip. 

We define a homogeneous Markov Process on space (X, F) with transition kernel 
P as a sequence of X-valued random variables (X n ) ne n such that, for any x £ X 
and F G F, 

Prob(X„ +1 g F\X n = x, Vi, ..,X ) = Prob(A rl+1 G F\X n = x) = P(x, F) 

for any n G IN. The evolution of (X n ) ne -^ is completely described once we fix 
a probability law /i of X on (X, J 7 ); if \l is invariant, then the Markov Process 
is said to be stationary: all the r.v.'s X n are distributed according to fi. Notice 
also that for any x G X and F G F, Prob(A m+ „ G F\X m = x) = P n (x, F) for 
any m, n G M. 

From now onwards, we will assume that X is a locally compact separable 
metric space: under this topological condition we can easily prove the existence 
of an invariant measure (see [T21 Section 12.3]). Let S(X) be the Borel er-algebra 
of X. 
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Definition 11 \1'A Sections 6.1.1, 11.3.1] Let P be a transition kernel on 
(X, B(X)). If P(-,0) is a lower semicontinuous function for any open set 
O G B(X), then P is said to be weak Feller. Moreover, we say that P veri- 
fies the Drift Condition if there exist a compact set C C X, a constant b < oo 
and a function V : X — > [0, oo] not always infinite such that 



for every i6X. 

Proposition 12 [121 Theorem 12.3.4] // a transition kernel P is weak Feller 
and verifies the Drift Condition, then it admits an invariant p.m.. 

Under some further conditions, also the uniqueness of the invariant measure can 
be proved. 

Definition 13 \12, Section 4.2.1] For any B e S(X), let t b = min{?i > : 
X n G B}. (X n ) n€ f) is said to be ^.-irreducible if there exists a measure fj. on 
S(X) such that for every i£X, (J-(B) > implies P(tb < +oo\Xq = x) > 0. 

A /x-irreducible Markov Process whose kernel admits an invariant p.m. is said 
to be positive recurrentmey:93 and 

Proposition 14 [TJ1 Theorem 10.0.1, Proposition 10.1.1] The kernel of a pos- 
itive recurrent Markov Process admits a unique invariant p.m.. 

Furthermore, 

Definition 15 J771 Definitions 2.2.2, 2.4-1] A set B e B(X) is said to be 
invariant if P(x,B) > 1b(x) for every a; € X. 

A p.m. n on 2?(X) is said to be ergodic if fJ.(B) — or fi(B) — 1 for every 
invariant set B £ S(X). 

Proposition 16 |11[ Proposition 2.4.3] If a Markov Process admits a unique 
invariant p.m. \x, then fj, is ergodic. 

A fundamental issue for our analysis is the Ergodic Theorem of Markov Pro- 
cesses, which is the transposition into stochastic terms of the Birkhoff's Indi- 
vidual Ergodic Theorem ([Ml Theorem 1.14]). Here we report its version under 
the ergodicity condition for an invariant p.m.; for a more general treatise, see 



Theorem 17 (Ergodic Theorem) [TTJ Theorem 2.3.4 - Proposition 2.4.2] 
Assume that a kernel P on (X, 23(X)) admits an ergodic invariant p.m. fi. 
Then, for any non-negative function v G £<i(X, S(X), /i) , 



Finally, we report a result of direct convergence for the iterates of the kernel, in 
the case of no periodic behavior. 




(29) 



Gam. 




K-l 
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Definition 18 [23, Section 3.6] A Markov Process is said to be strongly ape- 
riodic it there exist a set A C X 7 a measure v and a constant c such that 
P(x, B) > cv(B) for anyxeA,Be B(X). 

Now, let ||P n (a;, •) — = 2 sup \P n (x, B) — [i(B)\ be the total variation norm 

BeB(X) 

between the measures P n (x, •) and \x. 

Proposition 19 [23, Proposition 3.8] For a positive recurrent, aperiodic Markov 
Process with invariant p.m. pL, \\P n (x, ■) — — > as n — > oo for fi-a.e. i£X. 



4.2.2 The Mean BER 



Let Ak be the r.v. representing the normalized probability of the stored state 
in the Two States algorithm. We observe that (Ak, Dk)k^K is a Markov Process 
in ([0, 1] x Z, B([0, 1]) x "P(Z)) where B([0, 1]) is the Borel a-field on [0, 1] and 
"P(Z) is the discrete cr-field of Z. In order to completely define the process, we 
provide also an initial distribution C x k, C and k respectively being the usual 
Lebesgue measure on [0, 1] and the counting measure on Z. 

The transition probability kernels will be explicitly computed in the Ap- 
pendix |5.3| 



Proposition 20 The kernel of (Ak, -Dfc)fceiN admits an invariant p.m. (/). 

Proof We prove that the kernel of (Ak,Dk) satisfies both the Weak Feller 
Property and the Drift Condition; the result will then follow from Proposi- 
tion 12 First, we check the Drift Condition. By equations (49)- (51) in the 



Appendix, 



. 1 ^logJj^+d+l 
P((a, d), [0, 1] x {d + 1}) = ^erfc V —= 



P((a,d),[0,l]x{d-1}) = \- Jerfc 



^logy^+tT 
aV2 



(30) 



In particular, P((a, d), [0, 1] x {d+ 1}) and P((a, d), [0, 1] x {d~ 1}) have values 
in [0, 1/2] and are monotone respectively decreasing and increasing with respect 
to a. Now, let us define 

Sd = 2(|d| + io) (31) 



and 



V(a,d)^{ d d2 + 



if d > 0, a > 5j or if d < 0, a < 1 - Sd 



(32) 



otherwise. 

We are going to prove that V fulfills the Drift inequality for some compact C: 



AV(a,d)= / P((a,d),d(a',d'))V(a',d') - V(a,d) < -1 + bl c (a,d) 

i[0,l]xZ 

(33) 

for every (a, d) £ [0, 1] x Z. In order to individuate C, let us find out the 
values of (a, d) such that (33) holds with tc( a ,d) = 0. Recall that P((a,d), 
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A x {<f }) > =>• df € {d - 1, d, d + 1} for any a G [0, 1], A G S([0, 1]). 
In the next, let us use the notation u) = (a, d), u)' = (a', d'). 
If d > 0, 



„1 d+l 



AV(u)= f P(u,(da',d'))V(uj')-V{uj) 
Jo d'=d-i 



d+l 



- E 

d'=d- 
d+l 

- E 

d'=d- 
d+1 



P{u,{da' ,d')){2d' + dT J )+ I P{u,{da',d'))d' 2 



V{u) 



s d 



P(uj,(da',d'))d' 2 + / P(w, (da',d'))2d 



V{u) 



[P(", [0, 1] x {d'})d' 2 + P( U , [0, 6 d ] x {d'})2d'] - V(u) 



: d 2 + 2d[P(w, [0, 1] x {d + 1}) - P(u, ([0, 1] x {d - 1})] 
+ P{u), [0, 1] x {d + 1}) + P(oj, [0, 1] X {d - 1}) + 2dP(oj, [0, x Z) 
+ 2[P(u, [0, <y x {d + 1}) - P(w, [0, x {d - 1})] - F(w). 



As P(w, [0, 1] x {d + 1}) + P(u, [0, 1] x {d - 1}) < \ (see equations (30 1) and 
P(cj, [ft,/3 2 ] x Z) < G(/3 2 - #i) (see Lemma [28] in the Appendix pTgl) . 



AV(uj) <d 2 + 2d[P(w, [0, 1] x {d + 1}) - P(uj, [0, 1] x {d - 1})] + - 
+ 2(d + l)G8 d - V(u) 
<d 2 + 2d[P(u, [0, 1] x {d + 1}) - P(w, [0, 1] x {d - 1})] + - + G - V(w) 



(34) 



where we exploited that 2(d + l)G<5d < G by the definition (31 1 of S d - 

If d < 0, by analogous computation we obtain again the inequality (34). Let 
us study the behavior of this bound for every ui G [0, 1] x Z, according to the 
partition of [0, 1] x Z into four subsets given by the definition of V. 
Subset 1: If d > and a > 5 d , V(u) = d 2 and 



P(w,[0,l] x {d+l}) < ^erfc 



^lo gx / T 



<7\/2 



1 i / <T 2 logJ v^V +d' 
P( W , [0, 1] x {d - 1}) > - - -erfc 



hence inequality ( 34 1 becomes 



AV(w) < G + d 



= G + d 



crfc 



crfc 



^log^/^+d 
aV2 



- 1 



-^log(2d+19)+d 
^71 . 



- 1 
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As erfc(x) € (1,2) when the argument x is negative, then for d is sufficiently 
large the quantity in the square bracket is negative. Moreover, this quantity is 
multiplied by d; hence, there necessarily exists an integer g?J > 0, depending on 
the noise a, such that for any d > d$ , AV(u>) < — 1. 

Subset 2: If d < and a < 1 - 5 d , 



1 / -^logVii 
/>U..[().]]N-{,/+]}.)> s erfc[ 



d+1 



P(w,[0,l]x{d-l})<i-ierfc 



crv/2 



hence inequality ( 34 ) becomes 



AV(w) <G + d 



G + d 



erfc 



erfc 



-a 2 log 



l-Sd 



aV2 

^log(-2d+19) + rf+l 
aV2 



The computation is now analogous to the previous case and we conclude that 
there necessarily exists an integer d^ < 0, depending on the noise, such that for 
any d < d^ , AV(w) < -1. 



Subset 3: If d > and a < Sd, V(w) = d 2 + 2d; moreover, we have no tight 
bounds for P(u>, [0, 1] x {d + 1}) and P(cj, [0, 1] x {d — 1}): we can just notice 
that their difference is smaller than |. Substituting it in (34) we obtain 



AV(lj) <d 2 + G+-+d-d 2 



2d = G+^-d 



hence AV(lo) < -1 if d > d 1 = G -\ 

Subset 4: If d < and a > 1 

1})-P(w,[0,l]x{<i -1})>-|, 



6 d , V{uj) = d 2 - 2d; as P(u, [0, 1] x {d + 



AV(w) < G 



and AV< -1 if d< -di- 

Now, it is easy to verify that the subsets of [0, 1] x Z not yet consid- 
ered form the compact set ([0, <y x {0, ...,di}) U ([84, 1] x {0, ...,dg}) U 
([0,1 - S d } x -1,0}) U ([1 - 5 d ,l] x {-^,...,-1,0}). For simplic- 

ity, we can consider the bigger compact set C = [0, 1] x {— dc, ■ ■ ■ , dc}, where 
dc = max{d([, — d^ , c?i}: now, it is easy to check that for any w £ C the Drift 
Condition is satisfied whenever b > G + d^ + | . 

We now check the Weak Feller Property. Given any open interval I C [0, 1] 
and d! £ Z, the continuity of P(-,Ix{d'}) can be easily verified by the equations 
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(49l-(|5lj) (Section |5.3[ ): P((a,d),I x {d 1 }) is piecewise denned as combination 
of H, which is a continuous function; moreover, it is straightforward to check 
that the continuity holds also at the connection points. 
Furthermore, 

(a) any open set on the real line (hence on [0, 1]) is a countable union of disjoint 
intervals; 

(b) if /jv is a monotone increasing sequence of lower semicontinuous functions 
such that /jv t / pointwise, then / is lower semicontinuous. 

By (a), any open set O in [0, 1] can be expressed as O — U^L^n, with 
mutually disjoint open intervals in [0, 1]. Moreover, /jv(w) = P{lo, (U^ =1 /„) x 
{d'}) < 1 fulfills the hypotheses of statement (b), hence its pointwise limit 
f(uj) = P(uj, (U^/;) x {d'}) = P(w, O x {d'}) is lower semicontinuous. As any 
open set of the product topology can be expressed as U„ e z(0„ x {n}), O n open 
in [0, 1], the lower semicontinuity is extended to all the open sets. ■ Given 

the existence of an invariant p.m., we now evaluate the BER by means of the 



Ergodic Theorem 17 The BER is given by 



K-l K-l i 



Pb ^ = ^ P{ ^ k ^ U *> = ~k I H P ^k ¥= U k ,A k = a,D k = d)da 

k=0 k=0 •'° cieZ 

1 K ~ l f 1 

= kY, J2p(u k ^u k \A k = a ,D k = d)P k ((i,oy,(d a ,d)). 

k=o dez 

the initial state (1,0) being discussed in the Remark [2] Let q(a,d) = P{U k 7^ 
Uk\oik — oi 1 D k = d) (notice that q{a, d) actually does not depend on k) so that 
^) = |Eto(A)(l,0). Then, 

Corollary 21 Given the invariant p.m. <f>, 

lim Pb(e) = q dip. 

K^oo J[0,l]xZ 

Proof (A k , Z?fc)feg]N is (C x ft)-irreducible (the proof of this fact requires some 



technical computation and is postponed in the Appendix 5.5|, then <p is unique 
and ergodic by Propositions [14] and [THJ Therefore, by the Ergodic Theorem [17} 

1 K ~ x f „ „ 



fe=0 J[0,l]xZ 



This result cannot be immediately applied to evaluate the BER since the conver- 
gence is not assured for all the initial states. In particular, let call N C [0, 1] x Z 
the negligible set for which there is no convergence and let Nq = {a € [0, 1] : 
(a, 0) € N}. Now, recalling the Remark]!] 



; ) = lg(l,0) + i^ / £P((l,0),(da 1 ,<' 



P 6 (e) = -g(l,0) + - > > P((l > 0),(dai,di))(P*- 1 g)(ai,di) 



K-l 



i?(l,0) + i^ / P((l,0),(da 1 ,0))(P fc - 1 g)(a 1 ,0). 
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By the Lebesgue's Dominated Convergence Theorem, 
P h (e) = I Pffl.OUdai.On lim 4 



1 

lim P b (e) = / P((l,0),(dai,0)) lim r- V (P*" 1 ?)^, 0). 



Notice that £(iV ) = 0, otherwise 0(AT o x {0}) = L 1]xZ P(w, A^ X {O})0(dw) > 
C sfi L{N Q ) > by Propositior(27j By Proposition^ this implies that P((l, 0), N Q x 
{0}) = 0. Finally, 



1 

lim P 6 (e) = / P((l, 0), (dai, 0)) lim V (P*" 1 ^, 0) 

ya l£ [0,l]\JV A ^ 



= / P((l,0),(dai,0)) / qd<t>= f qd</> 

Jaie[0,l]\JV J[0,l]xZ J[0,lJxZ 

as (ai,0) £ N. W The function q(a,d) is explicitly computed in the Appendix 

EH 

4.2.3 The Conditional BER 

The CBER for the Two States algorithm can be derived just as we computed it 
for the One State case, in fact it holds the following 

Theorem 22 Let ir be the uniform Bernoulli probability measure over {0, 1} K . 
Then, for the Two States algorithm, 

lim P(,(e|U) — lim P&(e) for n-a.e. U. 

K— >oo A— >-oo 

We refer the reader to the Appendix |5.7| for the proof. 



4.3 Direct Convergence to </> 

The explicit construction of an invariant p.m. is an intricate issue in the not 
countable framework. When ergodic results are available, one can approximate 
it by several procedures (see, e.g, [ITJ Chapter 12]). In our framework, we can 
obtain an approximation by Proposition [T9j which states the direct convergence 
of the iterates P™(-, •) to the invariant p.m.. Before illustrating that, let us prove 
that the hypotheses of Proposition |T9| hold. 

Proposition 23 The Markov Process Dk)ken is strongly aperiodic. 



Proof Let us consider the probability measure C x <5j on ([0, 1] x Z, B([0, 1]) x 
■p(Z)), where C is the Lebesgue measure and S^(d) = 1 if d = d, otherwise. By 
~ P((a,d),M x {d}) > \C e4 C(M), C e<d > 0. Then, considering 
8 with v = C x 5j, c = \C £t d and A = [0, 1] x {d}, the proposition 



Proposition 



27 



the Definition 

is proved. ■ This result along with Proposition [T9| yields: 

Corollary 24 (Direct Convergence) \\P n ((a, d), •) — <f>\\ — > as n — > oo for 
4>-a.e. (a,d) € [0, 1] x Z. 



21 



0.4 



0.3 



0.2 



0.1 













c 


NESTA 


ONE S 
TEAnal 


ATESi 
ticCom 


lulation 
jutation 





























































































-30 -25 -20 -15 -10 -5 5 10 15 20 25 

SNR (dB) 

Figure 4: One State: analytic computation vs simulation. 
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Figure 5: Two States: analytic computation vs simulation. 



4.3.1 Analytic vs Simulations' outcomes 

To conclude our analysis of One State and Two States algorithms, we compare 
the simulations' outcomes with the theoretic results: we expect the BER's ob- 
tained by the simulations of sufficiently long transmissions to be consistent to 
the analytic computations. 

By Corollaries [8] and [21] the BER's can be computed once we know the 
corresponding invariant distributions. While for the One State algorithm the 



invariant measure is explicitly given by ( 25 ) , for the Two States algorithm we 
have approximated it using the Corollary 24 In particular, we have discretized 
the kernel P into a matrix, afterwards we have computed the iterates P n for a 
sufficiently large n, so that to obtain an equilibrium condition, that is, a matrix 
whose rows are all equal up to numerical roundoff . At this point, any row of 
the matrix is a discretized, approximated version of the invariant p.m. 
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In Figures [4] and [5j we compare analytic and simulations' outcomes: as 
expected, they do not present substantial differences. 



5 Appendix 

5.1 Markov Chains in Random Environments 

Consider a countable set 8 and a family of transition probability kernels {Pg, 9 £ 
6} on a space (X, F). Given a cr-field B of 6, let (0 n ) nG w and (Xk)keN respec- 
tively be sequences of 0-valued and X- valued r.v's. Pg k (Xk, F) can now be 
interpreted as the transition probability of Xk to set F depending on the r.v 6k, 
which represents to so-called random environment. 



We say that (Xk)kew with (0 n )„gz is a Markov Chain in Random Environment 
(or MCRE) if 



Let us define 8 K = J^o °° ® an< ^ ^ W = IIo °° ^n important feature of a 
MCRE is that we can always associate to it a classical Markov Process. In fact, 
given any igX and 9= (9q, 6\, . . . ) £ 9 M and denoting by T the left sequence 
shift on O w (that is, T9 — 9 with 9 n = 9 n+1 for any n £ IN), we can introduce 
the following transition probability kernel on (X x 6 K , F x B K ): 



which determines a Markov Process (X k , T k {9 n ) n& ^) fcgM on (X x 8 M , F x B M ). 
From now onwards, we will refer to it as to the Extended Markov Process, EMP 
for short. 

Remark 3 : As noted in the Section 1 of f^j, if the random environments 9 n 's 
are independent, then (Xk)keiN is a Markov Process with transition probability 
kernel P(x,F) — Eg e e" [Pe (x, F)] . In other terms, (Xk)ke'K is the Markov 
Process moving in the average environment. 

In this framework, we prove the following 

Proposition 25 Let (Xk)k£N with (0 n ) n eK be a MCRE on X x Q N . Suppose 
that the random environments 9 n 's are independent, identically distributed with 
distribution ttq on (0,B) and that the kernel of the Markov Process (Xk)keK 
admits an invariant p.m. cf>; given the distribution tt — x^LqTTo over (0 K ,S W ), 



is an invariant p.m. for the EMP (Xk,T k (9 n ) ne m) fcg]N over (X x M , F x B K ). 
Proof Let u — (x,9) £ X x M . ip is an invariant for (Xk, 9k)ken if 



P{X k+ i £ F\X k ,...,X ,{9 n ) neZ ) = Pg k (Xk,F) a.s. 
for all F G T and k = 0, 1, . . . 



(35) 




(36) 



1p — (f> X 7T 



(37) 
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for any F x B such that F G F, B G B K . Now, 
/ P(uj,FxB)^{du) = [ [ Pe (x,F)t B {e u e 2 ,...)ir{de)^(dx) 




= ir(B) [ P(x, F)cf)(dx) = ir(B)4>(F) = ip(F X B) 
ix 

where we have exploited the fact that cf> is invariant. ■ This Proposition 

is a partial extension of the Theorem 5 in |13) . which states the same result in 
the case of denumerable state space X and attests also the inverse implication 
(that is, all the invariant p.m.'s are product measures of kind p7| still in the 
denumerable framework. 

For a more detailed treatise on MCRE's, we refer the reader to [51 R)l [151,114) . 



5.2 Proof of Theorem M 



From equation 24 (-Dfc)fcgH with (Uk)ken turns out to be a countable MCRE. 
This is the right way to look at (Dk)ken if we want to understand its behavior 
with respect to typical instances of the input U = (Uo,Ui, . . .). For any 
Z, we have 

P(D k+1 =y\D k = x,D k _ 1 ,...,D ;V)=P XiV {U k ). 

Consider the space (Z x {0, 1} W ,:P(Z) x FJ" 1})) endowed with the initial 

distribution k x it, where k is the counting measure on Z and 7r is the usual 
uniform Bernoulli measure on {0,1}^. Given x,y G Z, u = («o,ui,...) G 
{0, 1} K and B G IlcT ^({°> !})> the EMP is defined by the transition probability 
kernel 

P((x, u); {y} xB)= P x<y (u )l B (Tu). (38) 

By Proposition [25j an invariant probability measure exists for our EMP and we 
explicitly compute it: in fact, let <j> be a p.m. on (Z, 'P(Z)) given by tfi({d}) = $d, 
<I>d being the invariant probability vector defined in the Proposition [7J for any 
integer d. Then, -0 — (f> x tt is an invariant p.m. for the EMP. 
We can verify that ip is ergodic by the following criterion (see Chapter 3 of [S]). 
Let P(Uq, . . . U n —i) the transition matrix whose entries are 

P x , y {U , . . . t4-i) - P(D n = y\D =x,U Q ,... U n _x). (39) 

If for each x,y G Z and 7r-a.e. U there exist n — n(x,y,U) G IN and 
z = z(x,y,XJ,n) G Z such that P XlZ (U , . . . , [/ n _i)P J/)X ([/ , . . . U n -i) > 0, then 
ip is ergodic. In our context it is easy to check that given any couple of starting 
states x and y, after n > \x — y\ steps we have a non-null probability of having 
joined a common state z. 

Define q d {U k ) = P[U k ^ U k \D k = d,U k ] = P d , d+ i(U k ) + P d . d -i{U k ) (q d is 
actually the mean of q^). For any K G IN and given Dq = 0, the CBER can be 
expressed as follows: 

1 K ~ X 

P b (e|U) = - E qd(^)Po,d(f/o, U x ,... U k -x) (40) 

k=0 deZ 
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Notice that, sinc6 the £7&'s k G IN axe independent, P(C/o ; . . . , Uk—i^) — 

P(C7 )P(^i)---P(C/fc-i)- 

Consider to = (x, U) and the function g(uj) = 1x(Uq): we have that 

<li(U k )P w , d (U , ■ ■ ■ l/fc-i) = ^(ar.U) 

and notice that 



P 6 ( e |U) = ~£P fc P (0,U). (41) 



k=0 

Now, by the Ergodic Theorem [17} 



<?(o;)^(du;) for ip-a..e. lo. (42) 



j? E pfc #H = / 

fe^O iZx{0,l}« 



Notice that, as pointed out after Proposition [7j (f>({d}) > for any d G Z; then, 
a set {d} x P, d G Z, P C {0, 1} M , is ^-negligible if and only if ir(B) = 0. 



Hence, in (42), "V>-a.e. cj" is equivalent to "for any d G Z and 7r-a.e. U". 



This, along with the equality (41), implies that 



Jim P 6 (e|U) = / g{cu)ip(duj) for 7r-a.e. U. (43) 



2x{0,l} M 

Finally, recalling that ip = 4> x n i 

f g(u)^(doj) = Y, E qd(WC/ )^ = E^- 

•^x{o,i} m deZ£/ =o,l dez 

5.3 Two States Algorithm: Computation of the Transition 
Probabilities 

In the next pages, we compute the probability of moving from a state (a,d) G 
[0,1] xZtoa set of type (0,/3) x {d'}, j3 G (0,1], d' G Z, for the Markov 
Process (Ah, Dk)kew defined in Section 3.2. Let P u ((a, d), (0, j3) x {d 1 }) be 
the transition probability given the transmitted bit u: P((a, d), (0, (3) x {d'}) — 
§P ((a, rf), (0, /?) x{d'}) +§Pi ((a, <2), (0, /3) x are null if d' <£ {d-1, d, d+1}, 
if d' = d+ 1 and u = 1 or if <i' = d — 1 and u = 0; we now compute the non-null 
instances. Given (a, d) g (0, 1) x Z and x G {a, (1 — a) -1 , 1}, y G {d— 1, e?, rf+1}, 
z G (0, 1), we define: 



/exp(l/a 2 ) 



h x , y {z) 



a(l — a) 

_ a 2 log (a^g) (44) 



0-^/2 

1 

2"' 



Hx., y {z) = -erfc {h x . y (z)) . 



Notice that these quantities depend on the noise variance a 2 , even if the notation 



does not emphasize that. Remind also Definition (22) 
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Case 1: d' = d,u = 0. 



P ((a,d), (0,f3) x {d}) = Prob(C 3 < Ci < /?(Ci + C 2 )\A k = a,D k = d,U k = 0) 
if a = or if a £ (0, 1) and fi < 1 

^ a>d (/3)- J ff a , d ( T ^) if a G (0, 1) and fi > 

{ H 14 (P) 



1 + C a 

JL 

1 + C a 

if a = 1. 



(45) 



Case 2: d' = d,u = 1. 

Pi ((a, d), (0,0) x {d}) = 

= Prob((C 3 > Ci) n (/3Cs > (1 - /3)C 2 )|A fe =a,D k = d, U k = l) 
H_j_ td (/3) ifa^Oorif ae (0,1) and /3 < ^ 

H^l^Ct^;) if a G (0, 1) and /3 > ^ 



o 



if a = 1. 



(46) 



Case 3: d! = d+ l,u = 0. 

P ((a,d),(0,/3) x {d+1}) = 

= Prob((C 3 > Ci) n (f3( 3 > (1 - /?)C 2 )K =a,D k = d, U k = 0) 



= < 



1-C 



,d+l 



■d+i yi+c, 



if a = or if a £ (0, 1) and fi < ^ 
if a e (0, 1) and fi > 



1 + C a 

if a = 1. 



(47) 



Case 4: d' = d - 1, u = 1. 

P 1 ((a,d),(0,/?)x{d-l}) = 

= Prob(C 3 < Ci < /3(Ci + C 2 )K =a,D k = d, U k = 1) 
'0 if a = or if a £ (0, 1) and fi < 

fla,d-i 0?) - -ffa.d-1 (iq^) if a £ (0, 1) and fi > 

{ ffM-iGS) 



l+c a 

if a = 1. 



(48) 



Remark 4 : As c a > 2, t-^- < I < f < 



l+c Q ^ 3 3 ^ l+c Q ' 

Summing up: 

P((a,d),(0,/3) x {d}) = 

{ H hd (fi) if a = or if a = 1 

H^_ td (f3) if a e (0, 1) and /? < ^ 

f ^ a)d (/3)- J ff a , d ( T ^r : )+^ id (/3) 
2 1 if « G (0, 1) and < /3 < ^J- 

#a,d W) - H a . d (j^) + (i+^J 

if a e (0, 1) and /3 > 



(49) 
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P((a,d),(0,/3)x{d + l}) = 

( H_j_ d+1 (j3) if a = or if a e (0, 1) and /3 < 

I I H^ d+X (^) ifae(0,l)and/3> T ^ 
[ if a = 1 

P((a,d),(0,/3) x {d-1}) = 

if a = or if a G (0, 1) and /3 < 

#a,d-i 08) - (iTc^) if « G (0, 1) and /3 > 

fli,d-iO0) V ifa = l. 

5.4 Two States Algorithm: Computation of rf) 



(50) 



(51) 



The function g on [0, 1] x Z defined in the Corollary 21 is given by q(a, d) = 
±F(U k = l\U k = 0,A k = a,D k =d) + iP(U k = 0\U k = l,A k = a,D k = d). 
Note that 

P(t4 = l\U k = 0, A fe = a, £ fc = d) - 

= Ymb(af [Yk+l]Xk+l) {y k+l \x k + 1) + (1 - a)f (Yk+l]Xk+1 ){yk+i\x k + 2) 
> uf(Y k+1 \x k+1 ){yk+i\x k ) + (1 - a)/(y fc+1 |x fc+1 )(^+il J fc + l )) 
__ 1 / a 2 logzi +d + | 
~ 2 ^ 72a 

where z\ is the positive solution of the equation (l—a)e~^ w z 2 + (2a—l)z—a = 0. 
Similarly, 

- 1 /o- 2 loa-zi +d- i 
P(C/ fc = 0|£7 fe = 1, j4 fe = a, D fc = d) = 1 - ^erfc ( 



hence 



q(a,d) = 



1 



1 r ( o 2 \ogz l + d+ \ 
2 



erfc | ' - ' 2 ) + 1 - rerfc ' " ~ 6 ~ 1 ' " 1 



1 r ( a 2 log zi + d - \ 
2 



%/2f7 



Naturally, if a = 1, then q(a,d) = | 
and wc recast into the One State case. 



\/2o 



1 - i crfc i 



5.5 Two States Algorithm: Proof of the (£xft)-irreducibility 

of (A k , D k )keK 



In this paragraph, we complete the proof of the Corollary 21 showing the (£ x k) 



irreducibility of (A k ,D k ) km in the space ([0, 1] x Z,£([0, 1]) x V(Z)). For this 
purpose, we first prove that any non-negligible Borel subset of kind M x {d 1 } C 
[0, 1] x Z is achievable with positive probability from any (a, d), in one or two 
steps, if d' € {d — 1, d, d+ 1} and M is sufficiently far from the extreme points 
of [0,1]: 
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Lemma 26 For any e > 0, d G Z, there exists a constant C £t d > such that 
the following inequalities hold for every (a, d) G [0, 1] x Z and M G B([e,l — s]),: 

P((a,d),M x {d}) >C eA C{M) 
P 2 ((a,d),M x {d+ 1}) > C e4 C{M) 
P 2 ((a,d),M x {d- 1}) > C eA C{M) 

where C is the Lebesgue measure. 

Proof First, we prove the lemma on the open intervals (0i,02) C [e, 1 — e]. 
For shortness of notation, let a = t^— . 

Consider the first inequality. On the basis of the equations p9[ ) and Remark 
|4j the following cases may occur: 

1. If a = 0, (/9i,0 2 ) G [s, 1 - e] or if a G (0, 1), (0i,0 2 ) C [e, C [e, §]: 

P((a,d),(0i,0 2 ) x {d}) = ~H Sti (fo)-\H a , d (0 X ) 



2^ 



-t'dt 



e 

1 ( h _u* ,., <> 



e -KA*) h& , d (z)dz 



2^ 9z 

>^(0 2 -0i) min -e-0')ly«) 

>-^(A»-j9i) min f- ^-ha^z)) min {e-^CW e -4,.(/3 2 

By definition (44), for any a;, y, -§^h x ,y{z) = z ^ z — — c2\/2; moreover, 

mill | e -4,,(^) ie -4,,(fc)j > ^L-^Mt^e""^^^^^. 

Notice now that for any d G Z, — > if and only if a — > 1; nevertheless, 
if a — > 1, also (l + c Q ) _1 — >0 and in particular there will be some a such 
that (1 + Cq,) -1 < e, which contradicts the hypothesis 0i > e. Hence, can 
we conclude that 

P((a,d), (01,0a) x {d}) ><jv/27^ minm M (0 2 - 0i) > 

a 

where the minimum has to be computed for a satisfying the initial hy- 
potheses. 

2. If a = 1,(01,02) G [e,l-e] or if a G (0,1), (0 X ,0 2 ) C [^,1-e] C 
[|, 1 — e]: by analogous procedure, we obtain 

P((a,d),(0i,0 2 ) x {d}) > a^/n minm a , d (02-0i) >0 

where m a d = min |e~' lQ > d '- e - ) , e^a,^ 1 -"^ > and its minimum is com- 
puted for a satisfying the above hypotheses. The positiveness holds since 
for any d G Z, m a c ; — >■ if and only if a — > 0, which implies j^J— — > 1 
and contradicts 02 < 1 — £■ 
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3. Otherwise: it is straightforward to verify that 

P((a,d),(fa,fa)x{d}) >a^/2jn {m aA + m^ d ) [fa - fa). 

Finally, if we consider 



rria,d if a = or if a 6 (0, 1) and e < fa < fa < 1 _^ c ; 
m(a, d, fa, fa) = { m aA if a = 1 or if a e (0, 1) and < fa < fa < 1 - e; 
m„ d + m a d otherwise. 

(52) 

and 

C ed =a \~ min fh(a,d,fa,fa) (53) 

(fll,fl2)C[e,l-e] 

we conclude that for any e > 0, d € Z, 

P(( a ,d),(A,A) x {d}) > C^GSa- A) C<;j > 0. (54) 
Let us prove the second inequality, on the basis of equations ( 50 1 . In this 



case, the component d of the state moves to d + 1, which is not always possible 
in one step. In particular, there are two situations in which the transition 



probability is null: a = 1 and when fa — (and given the continuity of ( 50 
problems occur whenever a — > 1 or /3; 



Both issues can be solved considering two-step transition: roughly speaking, 
if a is close to 1, a first step is used to move a away from 1 (and d remains 
constant); at this point, the probability to move d to d + 1 is positive. On the 
other hand, when fa is close to a first step is used to move d to d + 1 and 
a second one to move the component a to the desired interval (and now this 
is possible since we recast in the case in which d remains constant, previously 
studied). 

Let us assess this qualitative argumentation. 

1. If a = 0, (fa, fa) € [s, 1 - s] or if a G (0, 1 - Si] for some small S± > 0, 

(fa,fa)c[s,^}: 

P((a,d),(fa,fa)x{d+l})>a^2/^ min m &4+x {fa-fa) > (55) 

qG[0,1-«i] 

where the positiveness of min^^i-^] m a ,d+i > as been discussed 
above. 

2. If a € (0, 1 - Si], Pi £ [e, S 2 ] for some small Si, S 2 > and fa £ 

, 1 — e\: the transition probability depends on fa, not on fa, and 

P((a,d),(fa,fa) X {d+1}) >a^/2fiz min m fiJ+ i(-^ ft) 

ae(0,l-r5i] ' \1 + C Q / 

where - fa > 5 2 > S 2 (fa - fa). 

Let us now consider the cases that require two steps to move with non-null 
probability into the desired set. For this purpose, notice that 

P 2 ((a,d),(fa,fa) x {d+1}) = 

P((a,d),(da',d'))P((a',d'),(fa,fa)x{d+l}) (56) 

=d,d+l 
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3. If a € (0, 1 - Si], /3i £ (y^+ - (5 2 , /3 2 ) and ft £ [jf^i 1 - e]j w e exploit 
that 

P 2 ((a,d),(ft,ft) x {d + 1}) > 

r 1 (57) 
> / P((cM),(dc/,d + l))P((^d+l),(ft,ft)x{d+l}) 



As P((a', d + 1), (ft, ft) x {d + 1}) > C^j +1 (ft - ft) by (g), 
P 2 ((a, d), (ft , ft) x {d + 1}) > Cfj +1 (ft - ft)P((a, d), ([0, 1], d + 1)) 



> 



cfLiifc - ft)P((«, d), ([e, 1 - e],d + 1)) 



> 



>cfJ +1 (ft-ftW2/7r(l-2 £ ) min m a>d+ 



ae(0,l-5i 



4. If a G (1 — Si, 1], we exploit that 

P 2 ((a,d),(ft,ft)x{d+1}) > 



> 



(58) 



(59) 



P((a,d),(da',d))P((c/,d),(ft,ft) x {d+1}). 
A sufficient condition to have P((a', d), (ft, ft)x{d+l}) > is ft < 



sec 



551 which corresponds to a' 2 — a' + exp (4j-) ( 1 ^ 2 ^ > 0. This holds 



for any a' if 4 exp ( 



> 1, otherwise for a' £ [0, a] U [1 — 5, 1] 



where a = 



l-Jl-4c X p(^)(i 



£2 , 



Reducing the domain of integration to [0, a] , we obtain 
P 2 ((a,d),(ft,ft) x {d+1}) > 

> f P((a,d),(da',d))P({a',d),{pi,0 2 ) x {d+1}) 



> 



P((a, d), (da', d)) ( r v ^A ™ Q ',d+i(ft - ft) 



(60) 



> min "V.d+i(ft ~ ft)P((«, d), ([0, a], d)) 

.(1); 



> (t^/2/tt min m Q ,. d+1 (ft - ft)C^Ja. 
Finally, gathering the bounds obtained in the previous four cases, we obtain 



P 2 ((a,d),(ft,ft) x {d+1}) >Cf} (ft -ft) 



(61) 



where cf^ d = <S 2 (1 - 2e)aa ^J¥pK mm. ae [ 0> i_ Sl ] m a ,d+i min{C^j, C^j +1 } > 0. 

We omit the proof of the third inequality as it is analogous to the second 

(3) 

one: by the same argumentation, we obtain a suitable constant C B d . Finally, 
for any small e > and d £ Z, C M = min{C^, C^j, C^}. 
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The thesis is now proved for any open interval in [e, 1 — e\. The generalization 
to all the open sets in [e, 1 — e] is straightforward since any open set on the real 
line is countable union of disjoint open intervals. Finally, we can extend the 
result to all the Borelians in [e, 1 — e]. Remind that for any Lebesgue measurable 
set M (in particular, for any Borelian) in R there exists a sequence of open sets 
O n such that M C n^ =1 O n and C(M) = C{P^ =1 O n ), see [T7\. As any finite 
intersection of open sets is open, we have 

P r ({a,d),^ =l O n x {d'}) > C E £(n^ =1 O n ) > C £ £(n« =1 O n ) = C E C(M) 

for any d' G {d — 1, d, d + 1} and r = 1,2 according to the value of d! . This 
inequality holds for any N G IN, hence 

lim P r ((a,d),D^ =1 O n x {d'}) = P r ((a,d),n^ =1 0„ x > C £ £(M). 

■ By this lemma, it follows in particular that for any M G B([e, 1 — e\), 

P 2 \ d - d '\({a,d),M x {d'}) > cl d ~ d ' l £{M) Xd^d'; 
P({a,d),Mx{d}) >C e>d C{M). 

Moreover, 

Proposition 27 For any M € B([0, 1]) with C{M) > 0, 

P 2 \ d - d '\({a,d),M x {d'}) > \cl d ~ d ' ] C{M) ifd^ d'- 
P((a,d),Mx{d}) > \C e4 C{M). 

In particular, {A^, D^)^-^ is (C x n) -irreducible, k being the counting measure. 

Proof By the previous lemma, this result holds when M G B([e, 1 — e]) given 
any e > 0. Now, if we consider any M G B([0, 1]) with C(M) = A > 0, we 
have C(M Pi [e, 1 — e]) = C{M) - C(M n [e, 1 - e] c ) > A - 2e and we can always 
choose e = e(A) such that A > 2s. For instance, let us choose e = j, so that 
A - 2e = |. Therefore, 

P 2 l d - d 'l((a,d),M x > P 2 l" , l((a,d),(ln[£,l-e]) x {«"}) 

>c E /(MnM-e])>|C 

when d ^ a", and similarly when d = d'. I 

5.6 Two States Algorithm: an upper bound for the tran- 
sition probability kernel 

Lemma 28 There exists a real positive constant G such that 

P((a,d),M x Z) < GC(M) 
for any (a, d) G [0, l]xZ and M G B([0, 1]). 



31 



Proof First, we prove the lemma when M is an open interval. Consider 



the equations (49) - (51 1: given (a,d), P((a, d), /3 2 ) x Z) is equal to a 

sum of integrals of type e~ h ">y^{—h' x y (z))dz with x — a, 1/(1 — a) and 
y = d — l,d,d + 1 according to the instance. As we have shown in the Proof 
of Lemma 2, h' xy (z) = ^7~jwf ; hence g(z) — —e~ hx ^ z ' h' Xty (z) > for every 
z £ (0, 1). Furthermore, g'(z) is monotone decreasing over (0, 1) and null in one 
point zq G (0, 1) corresponding to the unique solution of the equation h XjV (z) = 
(h — z )\ hence g(z) is increasing in (0, zq), decreasing in (zq, 1) and admits a 
maximum in z G (0, 1). In conclusion, g(z)dz < G(f3 2 — /?i), G = g(z ). 
The extension to all the open sets is trivial as any open set is countable union 
of disjoint intervals. Finally, as for any M £ B([0, 1]) there exists a sequence of 
open sets O n such that M C n£° =1 0„ and C(M) = £(n£° =1 0„) (see [H]), for 
any n £ IN we can write 

P((a,d),n^ =1 O n x Z) < P((a,d),n£UO„ x z) < G£(n^ =1 o„) 

as any finite intersection of open sets is open. The result follows from the 
arbitrariness of N. ■ 



5.7 Proof of Theorem 1221 

The process (Ak, Dk)k^n with (Uk)k^'K is an instance of MCRE. The corre- 
sponding EMP in O = [0, 1] x Z x {0, 1} M is defined by the following transition 
probability kernel: 

P((a,d,u),Ax {d'} x B) =P U0 ((a,d),Ax{d'})l B (Tu) (62) 

where u = (u , «i, . . • ) G {0, A £ B([0, 1]), d'eZ.Be P({0, 1}®). 

P U0 ((a,d),A x {d'}j can be assessed by equations ( 45 )-( 48 ) . Moreover, we 
denote by P Uo ,... Ufc _ 1 ((a, d), A x {d'}j the probability of moving from (a,d) £ 
[0, 1] x Z to the set A x {d'}, A £ B([0, 1]), in fc-steps, given the input sequence 
(uq, . . . , Uk-i) £ {0, l} fe . By Proposition 25 ip — <j) x 7r (<j) being defined in 



Proposition 20), is an invariant p.m. for the EMP. Moreover, 
Lemma 29 ip is ergodic. 

Proof Let F C Q be an invariant set: by Definition [l5j to prove the ergodicity 
of ip is sufficient to show that ip(F) > implies ip(F) = 1. 
Then, let us suppose ip(F) > 0. We name 

Up = {ue {0,1} K : (a,d,u) £ F for some (a,d) £ [0,1] x Z}; 

Wo = {ue {0, l} 1 ^ : u contains infinitely many 0's and l's}; 

Uq = {u £ Uq : u contains at least a and a 1 in its first n bits }, n > 2. 



Given the transition probability kernel (62), if u G then also Tu G and 
since 7r is an ergodic measure with respect to the shift operator T (see [Ml 
Section 1.5]) and n(Up) > (otherwise ip(F) = 0), we have that ir(Up) = 1 by 
the Birkhoff's Individual Ergodic Theorem ([Ml Theorem 1.14]). 

By analogous reasoning, tt(Uq) — 1. Furthermore, Uq C Uq +1 , then Wq 1 f Wo- 
This implies the existence of an no > 2 such that tt(Uq°) > 0. 
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At this point, let us consider the equations (45)- (48): by applying the pro- 
cedure used to prove Lemma 26 and Proposition 27 it is easy to verify that for 
any (a,d) G (0, 1) x Z, 

P ((a, d),M x {d}) > for any M G B ( (1/3, 1] ) , £(M) > 0; 
Pi ((a, d), M x {d}) > for any M G B ( [0, 2/3) ) , £(M) > 0; 
P ((a, d),M x {d + 1}) > for any M G B ( [0, 2/3) ) , £(M) > 0; 
Pi ((a, d), M x {d - 1}) > for any MeB( (1/3, 1] ) , £(M) > 0. 



(63) 



where | and | are sufficient, not necessary bounds derived from Remark |4 
These inequalities yield to 



P i((a, d), M x {d}) > for any M G B ( [0, 1] ) , £(M) > 0; 



(64) 



Pio((a, d), M x {d}) > for any M £ B ( [0, 1] ) , £(M) > 0. 

Notice also that we are not considering the negligible cases a = and a = 1, 
which may prevent the one-step transition (see (45)-(48l). Maintaining this 
hypothesis, consider (a, d, u) G F such that u G U^ (this is always possible 
since Uq° C £/f V ,_a - e 0- By the invariance of F and (64), we obtain that 



[0,1] x {d} x {T n °u} C P 



(65) 



since u contains at least a and a 1 in its first no bits. Moreover, the fact 
that Uq° is not negligible implies that we can always choose u G Uq" such that 
V u = {T n u, n G IN} has measure 7r(V u ) = 1, as a consequence of [2H Theorem 
1.14]. Hence, 

[0, 1] x {d} x V u C P (66) 

Birkhoff. 

Furthermore, consider the evolution of the component d G Z: from equations 



(63) we deduce that any d has non-null probability to achieve, in n steps, any 
integer belonging to 

F> n = {d — mi, d — mi + 2, . . . , d + n — mi} 

mi being the number of l's in the corresponding n-bit input sequence. Hence, 

[0, 1] x D n x T"V U c P (67) 

where T"V U = V u 7r-a.e.. Given that for any n, D n C D n+ i, in particular, D n+ i 
has one more element than D n , then D n f Z. This finally proves that 

[0, 1] x Z x V u c P 7r-a.e. (68) 

0(P W = 1. But now also [0, 1] x Z x {Tw} C F. which implies 

${F) = 0([0, 1] x Z)tt(V u ) = 1. (69) 
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Given q(a,d,U k ) = P{U k £ U k \U k ,A k = a,D k = d), 

K-l 

K 



P b (e|U) = 1^ P ^k ¥= U k \V) = 

k=0 

= f E ^ E <^ a ' d ' ^)^(C/ ,... £ / fc _ 1 ) ((1, 0), (da, d)) • 



(70) 



dez fe=o 

Now, let g(a, d, U) = g(a, d, Uq): it is easy to verify that 



P k g(a, d,V)= ^)- p (c/o ) ...a fc _ 1 ) ((a,d), (da',d')) 

K-l 

P fc ( e |U) = -£p fe g (l,0,U). (71) 



fc=0 

By the Ergodic Theorem [T7J 

K-l 



lim ~ ^ (P k g){ UJ ) — [ 9 d^ for V'-a-e- well 

-f£"->oo A" ^— ' /o 



Let N C f2 be the negligible set for which there is no convergence and let 
Nq,u = {a G [0,1] : (a,0,U) € A}. By the same argumentation used in 
Corollary [2ll P Uo ((1, 0), JV^u x {0}) = and 



fe-i. 



P 6 (e|U) = ^(l,0,U) + l^ / ^o((l,0),(d 

K K k~l «'oi6[0,l] 



ia 1 ,0))(P fc - i 3 )(a 1 ,0,TU) 
P c/o ((l,0),(da 1 ,0)) / 5 dV= / ,9d^ 7r-a.e.U e {0, 



/Qie[o,i]\JV ,u 
which proves the thesis, as 

gdif= EE 9( Q ^,")0(da, rf)7T (M) = / g d0. 
J[o,i] dez„G{o,i} ^[o,i]xz 
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